A WaCky Introduction

نویسندگان

  • Silvia Bernardini
  • Marco Baroni
  • Stefan Evert
چکیده

We use the Web today for a myriad purposes, from buying a plane ticket to browsing an ancient manuscript, from looking up a recipe to watching a TV program. And more. Besides these “proper” uses, there are also less obvious, more indirect ways of exploiting the potential of the Web. For language researchers, the Web is also an enormous collection of (mainly) textual materials which make it possible, for the first time ever, to study innumerable instances of language performance, produced by different individuals in a variety of settings for a host of purposes. One of the tenets of corpus linguistics is the requirement to observe language as it is produced in authentic settings, for authentic purposes, by speakers and writers whose aim is not to display their language competence, but rather to achieve some objective through language. To study “purposeful language behavior”, corpus linguists require collections of authentic texts (spoken and/or written). It is therefore not surprising that many (corpus) linguists have recently turned to the World Wide Web as the richest and most easily accessible source of language material available. At the same time, for language technologists, who have been arguing for long that “more data is better data”, the WWW is a virtually unlimited source of “more data”. The potential uses to which the Web has been (or can be) put within the field of language studies are numerous and varied, from checking word frequencies using Google counts to constructing general or specialized corpora

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building Large Corpora from the Web Using a New Efficient Tool Chain

Over the last decade, methods of web corpus construction and the evaluation of web corpora have been actively researched. Prominently, the WaCky initiative has provided both theoretical results and a set of web corpora for selected European languages. We present a software toolkit for web corpus construction and a set of siginificantly larger corpora (up to over 9 billion tokens) built using th...

متن کامل

The wacky hypercoagulable state of malignancy.

In this issue of Blood, Warkentin et al describe a novel clinical syndrome of warfarin-associated severe venous limb ischemia occurring in a series of 10 patients with malignancy after initiating treatment of deep venous thrombosis. Patients in this series also demonstrated a decline in platelet counts after stopping heparin, warfarin-associated supratherapeutic international normalized ratios ...

متن کامل

Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is Killing Chinese Corpus Linguistics

This paper aims to examine and evaluate the current development of using Web-as-Corpus (WaC) paradigm in Chinese corpus linguistics. I will argue that the unstable notion of wordhood in Chinese and the resulting diverse ideas of implementing word segmentation systems have posed great challenges for those who are keen on building web-scaled corpus data. Two lexical measures are proposed to illus...

متن کامل

Referring to one's self in the third person

Hearing someone refer to themselves in the third person may seem "wacky", but you may be surprised to learn that it can actually be pretty helpful to the person doing it. Some people find that speaking in third person improves their self-esteem, their ability to perform well under stress, to manage their emotions more favorably, and to think through complex situations in a more rational and cal...

متن کامل

Referring to one's self in the third person

Hearing someone refer to themselves in the third person may seem "wacky", but you may be surprised to learn that it can actually be pretty helpful to the person doing it. Some people find that speaking in third person improves their self-esteem, their ability to perform well under stress, to manage their emotions more favorably, and to think through complex situations in a more rational and cal...

متن کامل

Referring to one's self in the third person

Hearing someone refer to themselves in the third person may seem "wacky", but you may be surprised to learn that it can actually be pretty helpful to the person doing it. Some people find that speaking in third person improves their self-esteem, their ability to perform well under stress, to manage their emotions more favorably, and to think through complex situations in a more rational and cal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006